System, method, and computer program product for representing object relationships in a multidimensional space

ABSTRACT

A method and computer product is presented for mapping n-dimensional input patterns into an m-dimensional space so as to preserve relationships that may exist in the n-dimensional space. A subset of the input patterns is chosen and mapped into the m-dimensional space using an iterative nonlinear mapping process. A set of locally defined neural networks is created, then trained in accordance with the mapping produced by the iterative process. Additional input patterns not in the subset are mapped into the m-dimensional space by using one of the local neural networks. In an alternative embodiment, the local neural networks are only used after training and use of a global neural network. The global neural network is trained in accordance with the mapping produced by the iterative process. Input patterns are initially projected into the m-dimensional space using the global neural network. Local neural networks are then used to refine the results of the global network.

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the benefit of U.S. ProvisionalApplication No. 60/191,108, filed Mar. 22, 2000 (incorporated in itsentirety herein by reference).

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The invention described herein relates to informationrepresentation, information cartography and data mining. The presentinvention also relates to pattern analysis and representation, and, inparticular, representation of object relationships in a multidimensionalspace.

[0004] 2. Related Art

[0005] Reducing the dimensionality of large multidimensional data setsis an important objective in many data mining applications.High-dimensional spaces are sparse (Bellman, R. E., Adaptive ControlProcesses, Princeton University Press, Princeton (1961)),counter-intuitive (Wegman, E., J. Ann. Statist. 41:457-471 (1970)), andinherently difficult to understand, and their structure cannot be easilyextracted with conventional graphical techniques. However, experiencehas shown that, regardless of origin, most multivariate data in R^(d)are almost never truly d-dimensional. That is, the underlying structureof the data is almost always of dimensionality lower than d. Extractingthat structure into a low-dimensional representation has been thesubject of countless studies over the past 50 years, and severaltechniques have been devised and popularized through the widespreadavailability of commercial statistical software. These techniques aredivided into two main categories: linear and nonlinear.

[0006] Perhaps the most common linear dimensionality reduction techniqueis principal component analysis, or PCA (Hotelling, H., J. Edu. Psychol.24:417-441; 498-520 (1933)). PCA reduces a set of partiallycross-correlated data into a smaller set of orthogonal variables withminimal loss in the contribution to variation. The method has beenextensively tested and is well-understood, and several effectivealgorithms exist for computing the projection, ranging from singularvalue decomposition to neural networks (Oja, E., Subspace Methods ofPattern Recognition, Research Studies Press, Letchworth, England (1983);Oja, E., Neural Networks 5:927-935 (1992); Rubner, J., and Tavan, P.,Europhys. Lett. 10:693-698 (1989)). PCA makes no assumptions about theprobability distributions of the original variables, but is sensitive tooutliers, missing data, and poor correlations due to poorly distributedvariables. More importantly, the method cannot deal effectively withnonlinear structures, curved manifolds, and arbitrarily shaped clusters.

[0007] A more general methodology is Friedman's exploratory projectionpursuit (EPP) (Friedman, J. H., and Tukey, J. W., IEEE Trans. Computers23:881-890 (1974); Friedman, J. H., J. Am. Stat. Assoc. 82:249-266(1987)). This method searches multidimensional data sets for interestingprojections or views. The “interestingness” of a projection is typicallyformulated as an index, and is numerically maximized over all possibleprojections of the multivariate data. In most cases, projection pursuitaims at identifying views that exhibit significant clustering and revealas much of the non-normally distributed structure in the data aspossible. The method is general, and includes several well-known linearprojection techniques as special cases, including principal componentanalysis (in this case, the index of interestingness is simply thesample variance of the projection). Once an interesting projection hasbeen identified, the structure that makes the projection interesting maybe removed from the data, and the process can be repeated to revealadditional structure. Although projection pursuit attempts to expresssome nonlinearities, if the data set is high-dimensional and highlynonlinear it may be difficult to visualize it with linear projectionsonto a low-dimensional display plane, even if the projection angle iscarefully chosen.

[0008] Several approaches have been proposed for reproducing thenonlinear structure of higher-dimensional data spaces. The best-knowntechniques are self-organizing maps, auto-associative neural networks,multidimensional scaling, and nonlinear mapping.

[0009] Self-organizing maps or Kohonen networks (Kohonen, T.,Self-Organizing Maps, Springer-Verlag, Heidelberg (1996)) wereintroduced by Kohonen in an attempt to model intelligent informationprocessing, i.e. the ability of the brain to form reducedrepresentations of the most relevant facts without loss of informationabout their interrelationships. Kohonen networks belong to a class ofneural networks known as competitive learning or self-organizingnetworks. Their objective is to map a set of vectorial samples onto atwo-dimensional lattice in a way that preserves the topology and densityof the original data space. The lattice points represent neurons whichreceive identical input, and compete in their activities by means oflateral interactions.

[0010] The main application of self-organizing maps is in visualizingcomplex multi-variate data on a 2-dimensional plot, and in creatingabstractions reminiscent of these obtained from clusteringmethodologies. These reduced representations can subsequently be usedfor a variety of pattern recognition and classification tasks.

[0011] Another methodology is that of auto-associative neural networks(DeMers, D., and Cottrell, G., Adv. Neural Info. Proces. Sys. 5:580-587(1993); Garrido, L., et al., Int. J. Neural Sys. 6:273-282 (1995)).These are multi-layer feed-forward networks trained to reproduce theirinputs as desired outputs. They consist of an input and an output layercontaining as many neurons as the number of input dimensions, and aseries of hidden layers having a smaller number of units. In the firstpart of the network, each sample is reorganized, mixed, and compressedinto a compact representation encoded by the middle layer. Thisrepresentation is then decompressed by the second part of the network toreproduce the original input. Auto-associative networks can be trainedusing conventional back-propagation or any other related techniqueavailable for standard feed-forward architectures. A special version ofthe multilayer perceptron, known as a replicator network (Hecht-Nielsen,R., Science 269:1860-1863 (1995)), has been shown to be capable ofrepresenting its inputs in terms of their “natural coordinates”. Thesecorrespond to coordinates in an m-dimensional unit cube that has beentransformed elastically to fit the distribution of the data. Although inpractice it may be difficult to determine the inherent dimensionality ofthe data, the method could, in theory, be used for dimensionalityreduction using a small value of m.

[0012] The aforementioned techniques can be used only for dimensionreduction. A more broadly applicable method is multidimensional scaling(MDS) or nonlinear mapping (NLM). This approach emerged from the need tovisualize a set of objects described by means of a similarity ordistance matrix. The technique originated in the field of mathematicalpsychology (see Torgeson, W. S., Psychometrika, 1952, and Kruskal, J. B.Phychometrika, 1964, both of which are incorporated by reference intheir entirety), and has two primary applications: 1) reducing thedimensionality of high-dimensional data in a way that preserves theoriginal relationships of the data objects, and 2) producing Cartesiancoordinate vectors from data supplied directly in the form ofsimilarities or proximities, so that they can be analyzed withconventional statistical and data mining techniques.

[0013] Given a set of k objects, a symmetric matrix, r_(ij), ofrelationships between these objects, and a set of images on am-dimensional display plane {y_(i), i=1, 2, . . . , k; y_(i) ∈ R^(m)},the problem is to place y_(i) onto the plane in such a way that theirEuclidean distances d_(ij)=∥y_(i)−y_(j)∥ approximate as closely aspossible the corresponding values r_(ij). The quality of the projectionis determined using a loss function such as Kruskal's stress:$\begin{matrix}{S = \sqrt{\frac{\sum\limits_{i < j}( {d_{ij} - r_{ij}} )^{2}}{\sum\limits_{i < j}r_{ij}^{2}}}} & (1)\end{matrix}$

[0014] which is numerically minimized in order to find the optimalconfiguration. The actual embedding is carried out in an iterativefashion by: 1) generating an initial set of coordinates y_(i), 2)computing the distances d_(ij), 3) finding a new set of coordinatesy_(i) using a steepest descent algorithm such as Kruskal's linearregression or Guttman's rank-image permutation, and 4) repeating steps 2and 3 until the change in the stress function falls below somepredefined threshold.

[0015] A particularly popular implementation is Sammon's nonlinearmapping algorithm (Sammon, J. W. IEEE Trans. Comp., 1969). This methoduses a modified stress function: $\begin{matrix}{E = \frac{\sum\limits_{i < j}^{k}\frac{\lbrack {r_{ij} - d_{ij}} \rbrack^{2}}{r_{ij}}}{\sum\limits_{i < j}^{k}r_{ij}}} & (2)\end{matrix}$

[0016] which is minimized using steepest descent. The initialcoordinates, y_(i), are determined at random or by some other projectiontechnique such as principal componenet analysis, and are updated usingEq. 3:

ij _(ij)(t+1)=y _(ij)(t)−λΔ_(y)(t)  (3)

[0017] where t is the iteration number and λ is the learning rateparameter, and $\begin{matrix}{{\Delta_{ij}(t)} = \frac{\frac{\partial{E(t)}}{\partial{y_{ij}(t)}}}{\frac{\partial^{2}{E(t)}}{\partial{y_{ij}(t)}^{2}}}} & (4)\end{matrix}$

[0018] There is a wide variety of MDS algorithms involving differenterror functions and optimization heuristics, which are reviewed inSchiffman, Reynolds and Young, Introduction to Multidimensional Scaling,Academic Press, New York (1981); Young and Hamer, MultidimensionalScaling: History, Theory and Applications, Erlbaum Associates, Inc.,Hillsdale, N.J. (1987); Cox and Cox, Multidimensional Scaling, Number 59in Monographs in Statistics and Applied Probability, Chapman-Hall(1994), and Borg, I., Groenen, P., Modem Multidimensional Scaling,Springer-Verlag, New York, (1997). The contents of these publicationsare incorporated herein by reference in their entireties. Differentforms of NLM will be discussed in greater detail below.

[0019] Unfortunately, the quadratic nature of the stress function (Eqs.1 and 2, and their variants) make these algorithms impractical for largedata sets containing more than a few hundred to a few thousand items.Several attempts have been devised to reduce the complexity of the task.Chang and Lee (Chang, C. L., and Lee, R. C. T., IEEE Trans. Syst., Man,Cybern., 1973, SMC-3, 197-200) proposed a heuristic relaxation approachin which a subject of the original objects (the frame) are scaled usinga Sammon-like methodology, and the remaining objects are then added tothe map by adjusting their distances to the objects in the frame. Analternative approach proposed by Pykett (Pykett, C. E., Electron. Lett.,1978, 14, 799-800) is to partition the data into a set of disjointclusters, and map only the cluster prototypes, i.e. the centroids of thepattern vectors in each class. In the resulting two-dimensional plots,the cluster prototypes are represented as circles whose radii areproportional to the spread in their respective classes. Lee, Slagle andBlum (Lee, R. C. Y., Slagle, J. R., and Blum, H., IEEE Trans. Comput.,1977, C-27, 288-292) proposed a triangulation method which restrictsattention to only a subset of the distances between the data samples.This method positions each pattern on the plane in a way that preservesits distances from the two nearest neighbors already mapped. Anarbitrarily selected reference pattern may also be used to ensure thatthe resulting map is globally ordered. Biswas, Jain and Dubes (Biswas,G., Jain, A. K., and Dubes, R. C., IEEE Trans. Pattern Anal. MachineIntell., 1981, PAMI-3(6), 701-708) later proposed a hybrid approachwhich combined the ability of Sammon's algorithm to preserve globalinformation with the efficiency of Lee's triangulation method. While thetriangulation can be computed quickly compared to conventional MDSmethods, it tries to preserve only a small fraction of relationships,and the projection may be difficult to interpret for large data sets.

[0020] The methods described above are iterative in nature, and do notprovide an explicit mapping function that can be used to project new,unseen patterns in an efficient manner. The first attempt to encode anonlinear mapping as an explicit function is due to Mao and Jain (Mao,J., and Jain, A. K., IEEE Trans. Neural Networks 6(2):296-317 (1995)).They proposed a 3-layer feed-forward neural network with n input and moutput units, where n and m are the number of input and outputdimensions, respectively. The system is trained using a specialback-propagation rule that relies on errors that are functions of theinter-pattern distances. However, because only a single distance isexamined during each iteration, these networks require a very largenumber of iterations and converge extremely slowly.

[0021] An alternative methodology is to employ Sammon's nonlinearmapping algorithm to project a small random sample of objects from agiven population, and then “learn” the underlying nonlinear transformusing a multilayer neural network trained with the standard errorback-propagation algorithm or some other equivalent technique (see forexample, Haykin, S. Neural Networks: A Comprehensive Foundation.Prentice-Hall, 1998). Once trained, the neural network can be used in afeed-forward manner to project the remaining objects in the plurality ofobjects, as well as new, unseen objects. Thus, for a nonlinearprojection from n to m dimensions, a standard 3-layer neural networkwith n input and m output units is used. Each n-dimensional object ispresented to the input layer, and its coordinates on the m-dimensionalnonlinear map are obtained by the respective units in the output layer(Pal, N. R. Eluri, V. K., IEEE Trans. Neural Net., 1142-1154 (1998)).

[0022] The distinct advantage of this approach is that it captures thenonlinear mapping relationship in an explicit function, and allows thescaling of additional patterns as they become available, without theneed to reconstruct the entire map. It does, however, rely onconventional MDS methodologies to construct the nonlinear map of thetraining set, and therefore the method is inherently limited torelatively small samples.

[0023] Hence there is a need for a method that can efficiently processlarge data sets, e.g., data sets containing hundreds of thousands tomillions of items.

[0024] Moreover, just like Mao and Jain (Mao, J., and Jain, A. K., IEEETrans. Neural Networks 6(2):296-317 (1995)) and Pal and Eluri (Pal, N.R. Eluri, V. K., IEEE Trans. Neural Net., 1142-1154 (1998)), a method isneeded that is incremental in nature, and allows the mapping of newsamples as they become available, without the need to reconstruct anentire map.

SUMMARY OF THE INVENTION

[0025] A method and computer product is presented for mapping inputpatterns of high dimensionality into a lower dimensional space so as topreserve the relationships between these patterns in the higherdimensional space. A subset of the input patterns is chosen and mappedinto the lower dimensional space using an iterative process based onsubset refinements. A set of local regions is defined using a clusteringmethodology, and a local neural network is associated with each of theseregions, and trained in accordance with the mapping obtained from theiterative process. Additional input patterns not in the subset aremapped into the lower dimensional space by using one of the local neuralnetworks. In an alternative embodiment, the local neural networks areonly used after training and use of a global neural network. The globalneural network is trained in accordance with the results of the mappingproduced by the iterative process. Input patterns are fed into theglobal neural network, resulting in patterns in the lower dimensionalspace. Local neural networks are then used to refine the results of theglobal network.

[0026] The method and computer product described herein permits themapping of massive data sets from a higher dimensional space to a lowerdimensional space. Moreover, the method allows the mapping of new inputpatterns as they become available, without the need to reconstruct anentire map.

[0027] The foregoing and other features and advantages of the inventionwill be apparent from the following, more particular description of apreferred embodiment of the invention, as illustrated in theaccompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS/FIGURES

[0028]FIG. 1 illustrates possibilities for a single hypotheticalpairwise relationship and distances of corresponding objects on anonlinear map.

[0029]FIG. 2 is a flowchart illustrating the phases of the method of theinvention.

[0030]FIG. 3 is a flowchart illustrating the training phase of theinvention, according to an embodiment.

[0031]FIG. 4 is a flowchart illustrating the use of a fuzzy clusteringmethodology in the selection of reference patterns, according to anembodiment of the invention.

[0032]FIG. 5 illustrates the concept of Voronoi cells, as used in anembodiment of the invention.

[0033]FIG. 6 is a flowchart illustrating the projection of inputpatterns, according to an embodiment of the invention.

[0034]FIG. 7 illustrates the operation of local neural networks,according to an embodiment of the invention.

[0035]FIG. 8 is a flowchart illustrating the training phase of theinvention, according to an alternative embodiment.

[0036]FIG. 9 is a flowchart illustrating the projection of inputpatterns, according to an alternative embodiment of the invention.

[0037]FIG. 10 illustrates the operation of global and local neuralnetworks, according to an alternative embodiment of the invention.

[0038]FIG. 11 illustrates a computing environment within which theinvention can operate.

DETAILED DESCRIPTION OF THE INVENTION

[0039] A preferred embodiment of the present invention is now describedwith reference to the figures where like reference numbers indicateidentical or functionally similar elements. Also in the figures, theleft most digit of each reference number corresponds to the figure inwhich the reference number is first used. While specific configurationsand arrangements are discussed, it should be understood that this isdone for illustrative purposes only. A person skilled in the relevantart will recognize that other configurations and arrangements can beused without departing from the spirit and scope of the invention. Itwill be apparent to a person skilled in the relevant art that thisinvention can also be employed in a variety of other devices andapplications.

[0040] I. Introduction

[0041] A. Overview

[0042] A neural network architecture for reducing the dimensionality ofvery large data sets is presented here. The method is rooted on theprinciple of probability sampling, i.e. the notion that a small numberof randomly chosen members of a given population will tend to have thesame characteristics, and in the same proportion, with the population asa whole. The approach employs an iterative algorithm based on subsetrefinements to nonlinearly map a small random sample which reflects theoverall structure of the data, and then “learns” the underlyingnonlinear transform using a set of distributed neural networks, eachspecializing in a particular domain of the feature space. Thepartitioning of the data space can be carried out using a clusteringmethodology. This local approach eliminates a significant portion of theimperfection of the nonlinear maps produced by a single multi-layerperceptron, and does so without a significant computational overhead.The proposed architecture is general and can be used to extractconstraint surfaces of any desired dimensionality.

[0043] The following section discusses methods that can be used tononlinearly map a random subset of the data.

[0044] B. Nonlinear Mapping Using Subset Refinements

[0045] 1. Overview

[0046] A nonlinear mapping algorithm that is well suited for large datasets is presented in U.S. patent application Ser. No. 09/303,671, filedMay 3, 1999, titled, “Method, System and Computer Program Product forNonlinear Mapping of Multidimensional Data”, and U.S. patent applicationSer. No. 09/073,845, filed May 7, 1998, titled, “Method, System andComputer Program Product for Representing Proximity Data in aMultidimensional Space”. This approach is to use iterative refinement ofcoordinates based on partial or stochastic errors.

[0047] The method uses a self-organizing principle to iteratively refinean initial (random or partially ordered) configuration of objects byanalyzing only a subset of objects and their associated relationships ata time. The relationship data may be complete or incomplete (i.e. somerelationships between objects may not be known), exact or inexact (i.e.some or all relationships may be given in terms of allowed ranges orlimits), symmetric or asymmetric (i.e. the relationship of object A toobject B may not be the same as the relationship of B to A) and maycontain systematic or stochastic errors.

[0048] The relationships between objects may be derived directly fromobservation, measurement, a priori knowledge, or intuition, or may bedetermined directly or indirectly using any suitable technique forderiving such relationships.

[0049] The invention determines the coordinates of a plurality ofobjects on the m-dimensional nonlinear map by:

[0050] (1) placing the objects on the m-dimensional nonlinear map;

[0051] (2) selecting a subset of the objects, wherein the selectedsubset of objects includes associated relationships between objects inthe selected subset;

[0052] (3) revising the coordinate(s) of one or more objects in theselected subset of objects on the m-dimensional nonlinear map based onthe relationship(s) between some of these objects and theircorresponding distance(s) on the nonlinear map;

[0053] (4) repeating steps (2) and (3) for additional subsets of objectsfrom the plurality of objects.

[0054] In one embodiment, subsets of objects can be selected randomly,semi-randomly, systematically, partially systematically, etc. As subsetsof objects are analyzed and their distances on the nonlinear map arerevised, the set of objects tends to self-organize.

[0055] In a preferred embodiment, the invention iteratively analyzes apair of objects at a time, that is, step (2) is carried out by selectinga pair of objects having an associated pairwise relationship. Pairs ofobjects can be selected randomly, semi-randomly, systematically,partially systematically, etc. Novel algorithms and techniques forpairwise analysis are provided in the sections below. This embodiment isdescribed for illustrative purposes only and is not limiting.

[0056] 2. Pairwise Relationship Matrices without Uncertainties

[0057] a. Full Pairwise Relationship Matrices without Uncertainties

[0058] The discussion in this section assumes that all pairwiserelationships are known, and they are all exact. In a preferredembodiment, the method starts with an initial configuration of pointsgenerated at random or by some other procedure such as principalcomponent analysis. This initial configuration is then continuouslyrefined by repeatedly selecting two objects, i, j, at random, andmodifying their coordinates on the nonlinear map according to Eq. 5:

y _(i)(t+1)=ƒ(t, y _(i) (t),y_(j) (t), r_(ij)) (5)

[0059] where t is the current iteration, y_(i)(t) and y_(j)(t) are thecurrent coordinates of the i-th and j-th objects on the nonlinear map,y_(i)(t+1) are the new coordinates of the i-th object on the nonlinearmap, and r_(ij) is the relationship between the i-th and j-th objects.ƒ(.) in Eq. 5 above can assume any functional form. Ideally, thisfunction should try to minimize the difference between the distance onthe nonlinear map and the actual relationship between the i-th and j-thobjects. For example, ƒ(.) may be given by Eq. 6: $\begin{matrix}{{y_{i}( {t + 1} )} = {0.5\quad {\lambda (t)}\frac{r_{ij} - {d_{ij}(t)}}{d_{ij}(t)}( {{y_{i}(t)} - {y_{j}(t)}} )}} & (6)\end{matrix}$

[0060] where t is the iteration number, d_(ij)=∥y_(i)(t)−y_(j)(t)∥, andλ(t) is an adjustable parameter, referred to hereafter as the “learningrate”. This process is repeated for a fixed number of cycles, or untilsome global error criterion is minimized within some prescribedtolerance. A large number of iterations are typically required toachieve statistical accuracy.

[0061] The method described above is generally reminiscent of the errorback-propagation procedure for training artificial neural networksdescribed in Werbos, Beyond Regression: New Tools for Prediction andAnalysis in the Behavioral Sciences, PhD Thesis, Harvard University,Cambridge, Mass. (1974), and Rumelhart and McClell and, Eds., ParallelDistributed Processing: Explorations in the Microstructure of Cognition,Vol. 1, MIT Press, Cambridge, Mass. (1986), both of which areincorporated herein by reference in their entireties.

[0062] The learning rate λ(t) in EQ. 6 plays a key role in ensuringconvergence. If λ is too small, the coordinate updates are small, andconvergence is slow. If, on the other hand, λ is too large, the rate oflearning may be accelerated, but the nonlinear map may become unstable(i.e. oscillatory). Typically, λ ranges in the interval [0, 1] and maybe fixed, or it may decrease monotonically during the refinementprocess. Moreover, λ may also be a function of i, j, r_(ij), and/ord_(ij), and can be used to apply different weights to certain objects,relationships, distances and/or relationship or distance pairs. Forexample, λ may be computed by Eq. 7: $\begin{matrix}{{{\lambda (t)} = {( {\lambda_{\min} + {t\frac{\lambda_{\max} - \lambda_{\min}}{T}}} )\quad \frac{1}{1 + {ar}_{ij}}}}\text{or Eq. 8:}} & (7) \\{{\lambda (t)} = {( {\lambda_{\min} + {t\frac{\lambda_{\max} - \lambda_{\min}}{T}}} )^{- {ar}_{ij}}}} & (8)\end{matrix}$

[0063] where λ_(max) and λ_(min) are the (unweighted) starting andending learning rates such that λ_(max), λ_(min), ∈[0,1], T is the totalnumber of refinement steps (iterations), t is the current iterationnumber, and α is a constant scaling factor. EQ. 7 and 8 have the effectof decreasing the correction at large separations (weak relationships),thus creating a nonlinear map which preserves strong relationships(short distances) more faithfully than weak ones. Weighting is discussedin greater detail below.

[0064] One of the main advantages of this approach is that it makespartial refinements possible. It is often sufficient that the pairwisesimilarities are represented only approximately to reveal the generalstructure and topology of the data. Unlike traditional MDS, thisapproach allows very fine control of the refinement process. Moreover,as the nonlinear map self-organizes, the pairwise refinements becomecooperative, which partially alleviates the quadratic nature of theproblem.

[0065] The embedding procedure described above does not guaranteeconvergence to the global minimum (i.e., the most faithful embedding ina least-squares sense). If so desired, the refinement process may berepeated a number of times from different starting configurations and/orrandom number seeds.

[0066] The general algorithm described above can also be applied whenthe pairwise similarity matrix is incomplete, i.e. when some of thepairwise similarities are unknown, when some of the pairwisesimilarities are uncertain or corrupt, or both of the above. These casesare discussed separately below.

[0067] b. Sparse Pairwise Relationship Matrices without Uncertainties

[0068] The general algorithm described above can also be applied whenthe pairwise relationship matrix is incomplete, i.e. when some of thepairwise relationships are unknown. In this case, a similar algorithm tothe one described above can be used, with the exception that thealgorithm iterates over pairs of objects for which the relationships areknown. In this case, the algorithm identifies configurations in spacethat satisfy the known pairwise relationships; the unknown pairwiserelationships adapt during the course of refinement and eventuallyassume values that lead to a satisfactory embedding of the knownrelationships.

[0069] Depending on the number of missing data, there may be more thanone satisfactory embeddings (mappings) of the original relationshipmatrix. In this case, different configurations (maps) may be derivedfrom different starting configurations or random number seeds. In someapplications such as searching the conformational space of molecules,this feature provides a significant advantage over some alternativetechniques. All variants of the original algorithm (see Sections below)can be used in this context.

[0070] 3. Pairwise Relationship Matrices with Bounded Uncertainties

[0071] The general algorithm described above can also be applied whenthe pairwise relationships contain bounded uncertainties, i.e. when someof the pairwise relationships are only known to within certain fixedtolerances (for example, the relationships are known to lie within arange or set of ranges with prescribed upper and lower bounds). In thiscase, a similar algorithm to the one described above can be used, withthe exception that the distances on the nonlinear map are corrected onlywhen the corresponding objects lie outside the prescribed bounds. Forexample, assume that the relationship between two objects, i and j, isgiven in terms of an upper and lower bound, r_(max) and r_(min);respectively. When this pair of objects is selected during the course ofthe refinement, the distance of the objects on the nonlinear map iscomputed, and denoted as d_(ij). If d_(ij) is larger than r_(max,) thecoordinates of the objects are updated using r_(max) as the targetdistance (Eq. 9):

y _(i)(t+1)=ƒ(t,y _(j)(t),y_(j)(t),r_(max) )  (9)

[0072] Conversely, if d_(ij) is smaller than r_(min) the coordinates ofthe objects are updated using r_(min) in as the target distance (Eq.10):

y _(i)(t+1)=ƒ(t,y _(i)(t),y_(j)(t),r _(min))  (10)

[0073] If d_(ij) lies between the upper and lower bounds (i.e. ifr_(min)≠d_(ij)≠r_(max) ), no correction is made. In other words, thealgorithm attempts to match the upper bound if the current distancebetween the objects is greater than the upper bound, or the lower boundif the current distance between the objects is lower than the lowerbound. If the distance between the objects lies within the upper andlower bounds, no correction is made.

[0074] This algorithm can be extended in the case where some of thepairwise relationships are given by a finite set of allowed discretevalues, or by a set of ranges of values, or some combination thereof.For the purposes of the discussion below, we consider discrete values asranges of zero width (e.g. the discrete value of 2 can be represented asthe range [2,2]).

[0075] Various possibilities for a single hypothetical pairwiserelationship and the current distance of the corresponding objects onthe nonlinear map are illustrated in FIG. 1, where shaded areas 110, 112and 114 denote allowed ranges for a given pairwise relationship.Distances d1-d5 illustrate 5 different possibilities for the currentdistance between the corresponding objects on the nonlinear map. Arrows116, 118, 120 and 122 indicate the direction of the correction thatshould be applied on the objects on the map. Arrows 118 and 122 point tothe left, indicating that the coordinates of the associated objects onthe nonlinear map should be updated so that the objects come closertogether. Arrows 116 and 120 point to the right, indicating that thecoordinates of the associated objects should be updated so that theobjects become more distant.

[0076] As in the case of a single range, if the current distance of aselected pair of objects on the nonlinear map lies within any of theprescribed ranges, no coordinate update takes place (i.e., case d1 inFIG. 1). If not, the correction is applied using the nearest rangeboundary as the target distance (i.e., cases d2-d5 in FIG. 1). Forexample, if the relationship between a given pair of objects lies in theranges [1,2], [3,5]and [6,7 ]and their current distance on the nonlinearmap is 2.9 (d5 in FIG. 1), the correction takes place using 3 as thetarget distance (r_(ij)) in Eq. 5. If, however, the current distance is2.1, the coordinates are updated using 2 as the target distance (r_(ij))in Eq. 5.

[0077] This deterministic criterion may be replaced by a stochastic orprobabilistic one in which the target distance is selected eitherrandomly or with a probability that depends on the difference betweenthe current distance and the two nearest range boundaries. In theexample described above (d5 in FIG. 1), a probabilistic choice between 2and 3 as a target distance could be made, with probabilities of, forexample, 0.1 and 0.9, respectively (that is, 2 could be selected as thetarget distance with probability 0.1, and 3 with probability 0.9). Anymethod for deriving such probabilities can be used. Alternatively,either 2 or 3 could be chosen as the target distance at random.

[0078] For example, bounded uncertainties in the pairwise relationshipsmay represent stochastic or systematic errors or noise associated with aphysical measurement, and can, in general, differ from one pairwiserelationship to another. A typical example are the Nuclear OverhauserEffects (NOE's) in multidimensional Nuclear Magnetic Resonancespectrometry. Alternatively, the uncertainty may result from multiplemeasurements of a given relationship.

[0079] An alternative algorithm for dealing with uncertainties is toreduce the magnitude of the correction for pairs of objects whoserelationship is thought to be uncertain. In this scheme, the magnitudeof the correction, as determined by the learning rate in Eq. 8, forexample, is reduced for pairwise relationships which are thought to beuncertain. The magnitude of the correction may depend on the degree ofuncertainty associated with the corresponding pairwise relationship (forexample, the magnitude of the correction may be inversely proportionalto the uncertainty associated with the corresponding pairwiserelationship). If the existence and/or magnitude of the errors isunknown, then the errors can be determined automatically by thealgorithm.

[0080] 4. Pairwise Relationship Matrices with Unbounded Uncertainties

[0081] The ideas described in the preceding Sections can be applied whensome of the pairwise relationships are thought to contain corrupt data,that is when some of the pairwise relationships are incorrect and bearessentially no relationship to the actual values. In this case,“problematic” relationships can be detected during the course of thealgorithm, and removed from subsequent processing. In other words, theobjective is to identify the corrupt entries and remove them from therelationship matrix. This process results in a sparse relationshipmatrix, which can be refined using the algorithm in Section 2.a above.

[0082] 5. Modifications of the Basic Algorithm

[0083] In many cases, the algorithm described above may be acceleratedby pre-ordering the data using a suitable statistical method. Forexample, if the proximities are derived from data that is available invectorial or binary form, the initial configuration of the points on thenonlinear map may be computed using principal component analysis. In apreferred embodiment, the initial configuration may be constructed fromthe first m principal components of the feature matrix (i.e. the mlatent variables which account for most of the variance in the data).This technique can have a profound impact on the speed of refinement.Indeed, if a random initial configuration is used, a significant portionof the training time is spent establishing the general structure andtopology of the nonlinear map, which is typically characterized by largerearrangements. If, on the other hand, the input configuration ispartially ordered, the error criterion can be reduced relatively rapidlyto an acceptable level.

[0084] If the data is highly clustered, by virtue of the samplingprocess low-density areas may be refined less effectively thanhigh-density areas. In one embodiment, this tendency may be partiallycompensated by a modification to the original algorithm, which increasesthe sampling probability in low-density areas. This type of biasedsampling may be followed with regular, unbiased sampling, and thisprocess may be repeated any number of times in any desired sequence.

[0085] Generally, the basic algorithm does not distinguish weak fromstrong relationships (short-range and long-range distances,respectively). One method to ensure that strong relationships arepreserved more faithfully than weak relationships is to weight thecoordinate update in EQ. 5 (or, equivalently, the learning rate λ in EQ.7 and 8) by a scaling factor that is inversely proportional to thestrength (magnitude) of the relationship

[0086] An alternative ( and complementary) approach is to ensure thatobjects at close separation are sampled more extensively than objects atlong separation. For example, an alternating sequence of global andlocal refinement cycles, similar to the one described above, can beemployed. In this embodiment, a phase of global refinement is initiallycarried out, after which, the resulting nonlinear map is partitionedinto a regular grid. The points (objects) in each cell of the grid arethen subjected to a phase of local refinement (i.e. only objects fromwithin the same cell are compared and refined). Preferably, the numberof sampling steps in each cell should be proportional to the number ofobjects contained in that cell. This process is highly parallelizable.This local refinement phase is then followed by another globalrefinement phase, and the process is repeated for a prescribed number ofcycles, or until the embedding error is minimized within a prescribedtolerance. Alternatively, the grid method may be replaced by anothersuitable method for identifying proximal points, such as clustering, forexample.

[0087] The methods described herein may be used for incremental mapping.That is, starting from an organized nonlinear map of a set of objects, anew set of objects may be added without modification of the originalmap. In an exemplary embodiment, the new set of objects may be“diffused” into the existing map, using a modification of the basicalgorithm described above. In particular, Eq. 5 and 6 can be used toupdate only the additional objects. In addition, the sampling procedureensures that the selected pairs contain at least one object from the newset. That is, two objects are selected at random so that at least one ofthese objects belongs to the new set. Alternatively, each new object maybe added independently using the approach described above.

[0088] II. Method

[0089] A. Nonlinear Mapping Networks—Algorithm I

[0090] The process described herein uses the iterative nonlinear mappingalgorithm described in Section II to multidimensionally scale a smallrandom sample of a set of input patterns of dimensionality n, and then“learns” the underlying nonlinear transform using an artificial neuralnetwork. For a nonlinear projection from n to m dimensions, a simple3-layer network with n input and m output units can be employed. Thenetwork is trained to reproduce the input/output coordinates produced bythe iterative algorithm, and thus encodes the mapping in its synapticparameters in a compact, analytical manner. Once trained, the neuralnetwork can be used in a feed-forward fashion to project the remainingmembers of the input set, as well as new, unseen samples with minimaldistortion.

[0091] The method of the invention is illustrated generally in FIG. 2.The method begins at step 205. In step 210, the training of a neuralnetwork takes place, where the training is based on the results (i.e.,the inputs and outputs) of the iterative algorithm. In step 215, pointsin R^(n) are projected into R^(m) by a feed-forward pass through thetrained neural network The process concludes with step 220.

[0092] B. Local Nonlinear Mapping Networks—Algorithm II

[0093] The embodiment of the invention described in this sectionrepresents a variation of the above algorithm. This approach is based onlocal learning. Instead of using a single “global” network to performthe nonlinear mapping across the entire input data space R^(n), thisembodiment partitions the space into a set of Voronoi polyhedra, anduses a separate “local” network to project the patterns in eachpartition. Given a set of reference points P={P₁, P₂, . . . } in R^(n),a Voronoi polyhedron (or Voronoi cell), v(p), is a convex polytopeassociated with each reference point p which contains all the points inR^(n) that are closer to p than any other point in P:

v(p)={x∈R ^(n) |d(x,p)≦d(x,q)∀p,q∈P,p≠q}  (11)

[0094] where d( ) is a distance function. In an embodiment of theinvention, d( ) is the Euclidean distance function. Voronoi cellspartition the input data space R^(n) into local regions “centered” atthe reference points P, also referred to as centroids. Hereafter, thelocal networks associated with each Voronoi cell are said to be centeredat the points P, and the distance of a point in R^(n) from a localnetwork will refer to the distance of that point from the network'scenter.

[0095] The training phase involves the following general steps: atraining set is extracted from the set of input patterns and mappedusing the iterative nonlinear mapping algorithm described in Section II.A set of reference points in the input space R^(n) is then selected, andthe objects comprising the training set are partitioned into disjointsets containing the patterns falling within the respective Voronoicells. Patterns that lie on the sides and vertices of the Voronoi cells(i.e. are equidistant to two or more points in P), are arbitrarilyassigned to one of the cells. A local network is then assigned to eachcell, and is trained to reproduce the input/output mapping of the inputpatterns in that cell. While the direct nonlinear map is obtainedglobally, the networks are trained locally using only the input patternswithin their respective Voronoi partitions. Again, simple 3-layerperceptrons with n input and m output units can be employed, where n andm are the dimensionalities of the input and output spaces, respectively.

[0096] The training phase of the method of the invention thereforeinvolves the following steps as illustrated in FIG. 3. The trainingphase begins at step 305. In step 310, a random set of points {x_(i),i=1,2, . . . , k; x_(i) ∈ R^(n)} is extracted from the set of inputpatterns. In step 315, the points X_(i) are mapped from R^(n) to R^(m)using the iterative nonlinear mapping algorithm described in Section II(x_(i)→y_(i), i=1,2, . . . , k, x_(i) ∈ R^(m)). This mapping serves todefine a training set T of ordered pairs (x_(i), y_(i)), T={(x_(i),y_(i)), i=1,2, . . . , k}.

[0097] In step 320, a set of reference points P={c₁, i=1,2, . . . c; c₁∈ R^(n)} is determined. In an embodiment of the invention, the referencepoints c_(i) are determined using a clustering algorithm described ingreater detail below. In step 325, the training set T is partitionedinto c disjoint clusters based on the distance of each point x_(i) fromeach reference point. The set of disjoint clusters is denoted{C_(j)={(x_(j), y_(i)): d(x_(i), c_(j))≦d(x_(i), C_(k))} for all k≠j;j=1,2, . . . , c; i=1,2, . . . , k}}. In step 330, c independent localnetworks {Net_(i) ^(L), i=1,2, . . . , c} are trained with therespective training subsets C_(i) derived in step 325. The trainingphase concludes with step 335.

[0098] Clearly, an important choice to be made concerns thepartitioning. In general, the reference points c_(i) (determined in step320) should be well distributed and should produce balanced partitionsthat contain a comparable number of training patterns. This is necessaryin order to avoid the creation of poorly optimized networks due to aninsufficient number of training cases. In one embodiment, describedhere, the reference points c_(i) can be determined using the fuzzyclustering means (FCM) algorithm (Bezdek, J. C., Pattern Recognitionwith Fuzzy Objective Function Algorithms. Plenum Press, 1981). The FCMalgorithm uses the probabilistic constraint that the sum of thememberships of a data point over all clusters must be equal to 1, andhas been most successful in situations where the final objective is acrisp decision, as is the case in the problem at hand.

[0099] The FCM algorithm attempts to minimize the objective function:$\begin{matrix}{J_{q} = {\sum\limits_{j = 1}^{C}{\sum\limits_{i = 1}^{N}{m_{ij}^{q}{d^{2}( {x_{i},c_{j}} )}}}}} & (12)\end{matrix}$

[0100] over a set of points x_(i), i=1,2, . . . , N, with respect to thefuzzy degrees of membership m_(ij), the “fuzziness” index q, and thecluster centroids c_(j), j=1, 2, . . . C, where C is the total number ofclusters, and m_(ij) is the degree of membership of the i-th pattern tothe j-th cluster. In addition, the solution must satisfy the constraintsthat the membership value of each pattern in any given cluster must beless than or equal to one:

0≦m _(ij)≦1  (13)

[0101] and the sum of its membership values over all clusters must beequal to 1: $\begin{matrix}{{\sum\limits_{j = 1}^{C}m_{ij}} = 1} & (14)\end{matrix}$

[0102] In an embodiment of the invention, the squared distance d²(x_(i),C_(j)) is the Euclidean distance between the i-th point in R^(n) and thej-th reference point. Equations 13 and 14 ensure that the solutions toEq. 6 represent true fuzzy partitions of the data set among all thespecified classes. In the above equation, q∈[1∞) is a weighting exponentknown as the “fuzziness index” or “fuzzifier” that controls the degreeof fuzziness of the resulting clusters. For q=1 the partitions arecrisp, and as q→∞ the clusters become increasingly fuzzy.

[0103] The determination of reference points c_(j), j=1,2, . . . , Cusing FCM is illustrated in FIG. 4, according to an embodiment of theinvention. The process starts with step 405. In step 410, an iterationcounter p is initialized. In step 415, an initial choice for the clustercentroids {C_(j), j=1,2, . . . , C} is made. Given this choice for{c_(j)}, in step 420 the degree of membership of each point x_(i) ineach cluster is calculated using the following formula: $\begin{matrix}{m_{ij} = \frac{\lbrack \frac{1}{d^{2}( {x_{i},c_{j}} )} \rbrack^{\frac{1}{q - 1}}}{\sum\limits_{k = 1}^{C}\lbrack \frac{1}{d^{2}( {x_{i},c_{k}} )} \rbrack^{\frac{1}{q - 1}}}} & (15)\end{matrix}$

[0104] In step 425, the objective function J_(q) ^(p) is evaluated usingEq. 16: $\begin{matrix}{J_{q}^{p} = {\sum\limits_{j = 1}^{C}{\sum\limits_{i = 1}^{N}{m_{ij}^{q}{d^{2}( {x_{i},c_{j}} )}}}}} & (16)\end{matrix}$

[0105] In step 430, new centroids C_(j) are computed using the formula:$\begin{matrix}{c_{j} = \frac{\sum\limits_{i = 1}^{N}{m_{ij}^{q}x_{i}}}{\sum\limits_{i = 1}^{N}m_{ij}^{q}}} & (17)\end{matrix}$

[0106] In step 435, m_(ij) are recalculated using the new centroids andEq. 15. In step 440, the iteration counter p is incremented. In step445, J_(q) ^(p) is recalculated in light of the new m_(ij) determined instep 435.

[0107] In step 450, the difference between J_(q) ^(p) and itspredecessor value J_(q) ^(p−1) is determined. If the difference issufficiently small, as indicated in the inequality |J_(q) ¹−J_(q) ⁰|>ε,where ε is a small positive constant (here ε=0.0009), then the processconcludes at step 455, and the most recently determined {c_(j)} is usedas the set of reference points for partitioning purposes. Otherwise,another set of centroids is determined in step 430. The process willcontinue until the condition of step 450 is satisfied.

[0108] Once the convergence criterion in step 450 has been met, the newcentroids computed by Eq. 17 are used to partition the input data setinto a set of Voronoi cells. Such cells are illustrated in FIG. 5. A set500 is shown partitioned into Voronoi cells, such as cells 505A through505C. The Voronoi cells include centroids 510A through 510Crespectively.

[0109] Once all the local networks are trained, additional patterns fromthe input set of patterns can be mapped into R^(m) as illustrated inFIG.6. The process begins with step 605. In step 610, the distance ofthe input pattern x to each reference point in {c_(i), i=1,2, . . . c;c_(i) ∈ R^(n)} is determined. In step 615, the point C_(j) that isnearest to the input pattern x_(i)s identified. In step 620, the patternx is mapped to a point y in R^(m), x→y, x ∈ R^(n), y ∈ R^(m) using thelocal neural network Net_(j) ^(L) associated with the reference pointc_(j)identified in step 615. The process concludes with step 625.

[0110] Note that new patterns in R^(n) that not in the original inputset can also be projected into R^(m) in the manner shown in FIG. 6. Oncethe system is trained, new patterns in R^(n) are mapped by identifyingthe nearest local network and using that network in a feed-forwardmanner to perform the projection. An embodiment of a system that doesthis is illustrated in FIG. 7. The input for the system is a pattern 705in R^(n). This point is defined by its n attributes, x is x_(2, . . . x)_(n)). The system includes a dispatcher module 710, which compares thedistance of the input point to the network centers (i.e., the referencepoints), and forwards the input point to one of the available localneural networks 701, 702, or 703. Specifically, the input pattern issent to the local neural network associated with the reference pointnearest to the input pattern. The chosen network then performs the finalprojection, resulting in an output point in R^(m.)

[0111] C. Local Nonlinear Mapping Networks—Algorithm III

[0112] The ability of a single network to reproduce the generalstructure of the nonlinear map suggests an alternative embodiment toovercome some of the complexities of clustering in higher dimensions.Conceptually, the alternative embodiment differs from Algorithm II inthe way it partitions the data space. In contrast to the previousmethod, this process partitions the output space, and clusters thetraining patterns based on their proximity on the m-dimensionalnonlinear map rather than their proximity in the n-dimensional inputspace. For the training set, the assignment to a partition isstraightforward. The images of the points in the training set on thenonlinear map are derived directly from the iterative algorithmdescribed in Section II. For new points that are not part of thetraining set, the assignment is based on approximate positions derivedfrom a global neural network trained with the entire training set, likethe one described in section II. A. The general flow of the algorithm issimilar to the one described in section II. B.

[0113] The training phase for this embodiment is illustrated in FIG. 8.The method begins at step 805. In step 810, a random set of patterns{x_(i), i=1,2, . . . , k; x_(i) ∈ R^(n)} is extracted from the inputdata set. In step 815, the patterns X_(i) are mapped from R^(n) to R^(m)using the iterative nonlinear mapping algorithm described in section I B(x_(i)→y_(i, i=)1,2, . . . , k, x_(i) ∈ R^(n), y_(i ∈ R) ^(m)). Thismapping serves to define a training set T of ordered pairs (x_(i),y_(i)), T={(x_(i), y_(i)), i=1,2, . . . , K}.

[0114] In step 820, the points {y_(i, i=)1,2, . . . , k, y_(i)∈R^(m) }are clustered into c clusters associated with c points in R^(m), {c_(i),i=1,2, . . . c; c,E∈ R^(m)}. In the illustrated embodiment, fuzzyclusters are formed in this step using the FCM algorithm of FIG. 4. Instep 825, the training set T is partitioned into c disjoint clustersC_(j) based on the distance of the images y_(i) from the clusterprototypes, {C_(j),={(x_(i), y_(i)): d(y_(i), c_(j))≦d(y_(i),c_(k)) forall k≠j; j=1,2, . . . , c; i=1,2, . . . , k}}. In step 830, cindependent local neural networks {Net_(i) ^(L), i=1,2, . . . , c} aretrained with the respective clusters C_(j) derived in step 825. In step835, a global network Net^(G) is trained with the entire training set T.The process concludes with step 840.

[0115] Once all the networks are trained, remaining input patterns fromthe input data set and any new patterns in R^(n) are projected using atandem approach. An embodiment of this is illustrated in FIG. 9. Theprojection process begins at step 905. In step 910, each input pattern xto be projected into R^(m) is mapped,x→y′, x ∈ R^(n), y′ ∈ R^(m), usingthe global network Net^(G) derived in step 835.

[0116] In step 915, the distance from y′ to each reference point c_(i)in {c_(i), i=1,2, . . . c; c_(i)∈R^(m)} is determined. In step 920, thepoint C_(j) closest to y′ is determined. In step 925, x is mapped intoR^(m), x→y. using the local neural network associated with c_(j),Net_(j) ^(L). The process ends with step 930.

[0117] A system for performing the overall mapping x→y is shown in FIG.10. First, an input pattern 1005 ∈ R^(n) is projected by the globalnetwork 1010, Net^(G), to obtain point 1012 (y′) ∈ R^(m). Point y′ canbe viewed as having approximate coordinates on the nonlinear map. Thesecoordinates are used to identify the nearest local network 1021 (Net_(j)^(L)) from among the possible local neural networks 1021 through 1023,based on the proximity of y′ to each c_(i). Input point 1005 isprojected once again, this time by the nearest local network 1021 toproduce the final image 1030 on the display map.

[0118] III. Environment

[0119] The present invention may be implemented using hardware, softwareor a combination thereof and may be implemented in a computer system orother processing system. An example of such a computer system 1100 isshown in FIG. 11. The computer system 1000 includes one or moreprocessors, such as processor 1104. The processor 1104 is connected to acommunication infrastructure 1106 (e.g., a bus or network). Varioussoftware embodiments can be described in terms of this exemplarycomputer system. After reading this description, it will become apparentto a person skilled in the relevant art how to implement the inventionusing other computer systems and/or computer architectures.

[0120] Computer system 1100 also includes a main memory 1108, preferablyrandom access memory (RAM), and may also include a secondary memory1110. The secondary memory 1110 may include, for example, a hard diskdrive 1112 and/or a removable storage drive 1114, representing a floppydisk drive, a magnetic tape drive, an optical disk drive, etc. Theremovable storage drive 1114 reads from and/or writes to a removablestorage unit 1118 in a well known manner. Removable storage unit 1118represents a floppy disk, magnetic tape, optical disk, etc. As will beappreciated, the removable storage unit 1118 includes a computer usablestorage medium having stored therein computer software and/or data. Inan embodiment of the invention, removable storage unit 1118 can containinput data to be projected.

[0121] Secondary memory 1110 can also include other similar means forallowing computer programs or input data to be loaded into computersystem 1100. Such means may include, for example, a removable storageunit 1122 and an interface 1120. Examples of such may include a programcartridge and cartridge interface (such as that found in video gamedevices), a removable memory chip (such as an EPROM, or PROM) andassociated socket, and other removable storage units 1122 and interfaces1120 which allow software and data to be transferred from the removablestorage unit 1122 to computer system 1100.

[0122] Computer system 1100 may also include a communications interface1124. Communications interface 1124 allows software and data to betransferred between computer system 1100 and external devices. Examplesof communications interface 1124 may include a modem, a networkinterface (such as an Ethernet card), a communications port, a PCMCIAslot and card, etc. Software and data transferred via communicationsinterface 1124 are in the form of signals 1128 which may be electronic,electromagnetic, optical or other signals capable of being received bycommunications interface 1124. These signals 1128 are provided tocommunications interface 1124 via a communications path (i.e., channel)1126. This channel 1126 carries signals 1128 and may be implementedusing wire or cable, fiber optics, a phone line, a cellular phone link,an RF link and other communications channels. In an embodiment of theinvention, signals 1128 can include input data to be projected.

[0123] In this document, the terms “computer program medium” and“computer usable medium” are used to generally refer to media such asremovable storage drive 1114, a hard disk installed in hard disk drive1112, and signals 1128. These computer program products are means forproviding software to computer system 1100. The invention is directed tosuch computer program products.

[0124] Computer programs (also called computer control logic) are storedin main memory 1108 and/or secondary memory 1110. Computer programs mayalso be received via communications interface 1124. Such computerprograms, when executed, enable the computer system 1100 to perform thefeatures of the present invention as discussed herein. In particular,the computer programs, when executed, enable the processor 1104 toperform the features of the present invention. Accordingly, suchcomputer programs represent controllers of the computer system 1100.

[0125] In an embodiment where the invention is implemented usingsoftware, the software for performing the training and projection phasesof the invention may be stored in a computer program product and loadedinto computer system 1100 using removable storage drive 1114, hard drive1112 or communications interface 1124.

[0126] In another embodiment, the invention is implemented using acombination of both hardware and software.

[0127] VI. Conclusion

[0128] While various embodiments of the present invention have beendescribed above, it should be understood that they have been presentedby way of example, and not limitation. It will be apparent to personsskilled in the relevant art that various changes in detail can be madetherein without departing from the spirit and scope of the invention.Thus the present invention should not be limited by any of the abovedescribed exemplary embodiments, but should be defined only inaccordance with the following claims and their equivalents.

What is claimed is:
 1. A method of mapping a set of n-dimensional inputpatterns to an m-dimensional space using locally defined neuralnetworks, comprising the steps of: (a) creating a set of locally definedneural networks trained according to a mapping of a subset of then-dimensional input patterns into an m-dimensional output space; (b)mapping additional n-dimensional input patterns using the locallydefined neural networks.
 2. The method of claim 1, wherein step (a)comprises the steps of: (i) selecting k patterns from the set of inputpatterns, {x_(i), i=1, 2, . . . k, x_(i) ∈ R^(n)}; (ii) mapping thepatterns {x_(i) } into an m-dimensional space (x_(i)—y_(i), i=1, 2, . .. k, y_(i) ∈ R^(m)), to form a training set T={(x_(i), y_(i)), i=1,2,k}; (iii) determining c n-dimensional reference points, {c_(i), i=1,2, .. . c,c_(i)∈R^(n)}; (iv) partitioning T into c disjoint clusters C_(j)based on a distance function d, {C_(j)={(x_(i), y_(i)): d(x_(i),c_(j)).≦d(x_(i),c_(k)) for all k≠j; j=1, 2, . . . c;i=1,2, . . . k}; and(v) training c independent local networks {Net_(i) ^(L), i=1, 2, . . .c}, with the respective pattern subsets C_(i).
 3. The method of claim 2,wherein said step (iii) is performed using a clustering methodology. 4.The method of claim 2, wherein said step (b) comprises the steps of: (i)for an additional n-dimensional input pattern x ∈ R^(n), determining thedistance to each reference point in {c_(i) }; (ii) identifying thereference point c_(j)closest to the input pattern x; and (iii) mappingx→y, y ∈ R^(m), using the local neural network Net_(j) ^(L) associatedwith the reference point C_(j) identified in step (ii).
 5. The method ofclaim 1, wherein step (a) comprises the steps of: (i) selecting kpatterns of the set of n-dimensional input patterns, {x_(i), i=1, 2, . .. k, x_(i) ∈ R^(n)}; (ii) mapping the patterns {x_(i) } into anm-dimensional space (x_(i)→y_(i), i=1, 2, . . . k, y_(i)∈ R_(m)), toform a training set T−{(x_(i), y_(i)), i=1, 2, . . . k}; (iii)determining c m-dimensional reference points, {c₁,i=1,2, . . . c,c_(i ∈ R) ^(m)}; (iv) partitioning T into c disjoint clusters C_(j)based on a distance function d, {C_(j)={(x_(i), y_(i)): d(y_(i),c_(j))≦d(y_(i),c_(k)) for all k :i j; j=1, 2, c; i=1,2, . . . k}}; (v)training c independent local networks {Net_(i) ^(L), i=1, 2, . . . c},with the respective pattern subsets C_(j); and (vi) training a globalnetwork Net^(G) using all the patterns in T.
 6. The method of claim 5,wherein said step (iii) is performed using a clustering methodology. 7.The method of claim 5, wherein step (b) comprises the steps of: (i) foran additional n-dimensional pattern x ∈ R^(n), mapping x→y′, y′ ∈ R^(m),using Net^(G); (ii) determining the distance of y′ to each referencepoint in {c,_(1};) (iii) identifying the reference point c_(j) closestto y′, and (iv) mapping x→y,y ∈ R^(m), using the local neural networkNet_(j) ^(L) associated with the reference point c_(j) identified instep (iii).
 8. A computer program product comprising a computer usablemedium having computer readable program code means embodied in saidmedium for causing an application program to execute on a computer thatmaps a set of n-dimensional input patterns to an m-dimensional spaceusing locally defined neural networks, said computer readable programcode means comprising: a first computer readable program code means forcausing the computer to create a set of locally defined neural networkstrained according to a mapping of a subset of the n-dimensional inputpatterns into an m-dimensional space; a second computer readable programcode means for causing the computer to project additional n-dimensionalpatterns of the input set using the locally defined neural networks. 9.The computer program product of claim 8, wherein said first computerreadable code means comprises: (i) computer readable program code meansfor selecting k patterns from the set of input patterns, {x₁, i=1, 2, .. . k, x₁ ∈ R^(n)}; (ii) computer readable program code means formapping the patterns {X_(i)} into an m-dimensional space (x₁→y_(i), i=1,2, . . . k, y_(i) ∈ R_(m)), to form a training set T={(x_(i), y_(i)),i=1, 2, . . . k}; (iii) computer readable program code means fordetermining c n-dimensional reference points, {c_(i), i=1, 2, . . . c,c_(i) ∈ R^(n)}; (iv) computer readable program code means forpartitioning T into c disjoint clusters C_(j) based on a distancefunction d, {C_(j)={(x₁, y_(i)): d(x_(i), c_(j))≦d(x_(i),c_(k)) for allk≠j; j=1, 2, . . . c; i=1, 2, . . . k}}; and (v) computer readableprogram code means for training c independent local networks {Net_(i)^(L), i=1, 2, . . . c}, with the respective pattern subsets C_(i). 10.The computer program product of claim 9, wherein said computer readableprogram code means uses a clustering methodology.
 11. The computerprogram product of claim 9, wherein said second computer readable codemeans comprises: (i) for an additional n-dimensional pattern x ∈ R^(n),computer readable program code means for determining the distance toeach reference point in {c_(i)}; (ii) computer readable program codemeans for identifying the reference point c_(j) closest to the inputpattern x; and (iii) computer readable program code means for mappingx→y,y ∈ R^(m), using the local neural network Net_(j) ^(L) associatedwith the reference point c_(j) identified in step (ii).
 12. The computerprogram product of claim 8, wherein said first computer readable programcode means comprises: (i) computer readable program code means forselecting k patterns of the set of n-dimensional input patterns, {x_(i),i=1, 2, . . . k, x_(i) ∈ R^(n)}; (ii) computer readable program codemeans for mapping the patterns {x_(i)} into an m-dimensional space(x_(i)→y_(i), i=1, 2, . . . k), to form a training set T={(x_(i),y_(i)), i=1, 2, . . . k}; (iii) computer readable program code means fordetermining c m-dimensional reference points, {c_(i), i=1, 2, . . . c,c_(i) ∈ R^(m)}; (iv) computer readable program code means forpartitioning T into c disjoint clusters C_(j) based on a distancefunction d, {C_(j)={(x_(i), y_(i)): d(y_(i), c_(j))≦d(y_(i),c_(k)) forall k≠j; j=1, 2, . . . c; i=1, 2, . . . k}}; (v) computer readableprogram code means for training c independent local networks {Net₁ ^(L),i=1, 2, . . . c}, with the respective pattern subsets C_(t); and (vi)computer readable program code means for training a global networkNet^(G) using all the patterns in T.
 13. The computer program product ofclaim 12, wherein said computer readable program code means uses aclustering methodology.
 14. The computer program product of claim 12,wherein said second computer readable program code means comprises: (i)for an additional n-dimensional pattern x ∈ R^(n), computer readableprogram code means for mapping x→y′, y′ ∈ R^(m), using Net^(G); (ii)computer readable program code means for determining the distance of y′to each reference point in {c_(i)}; (iii) computer readable program codemeans for identifying the reference point C_(j) closest to y′, and (iv)computer readable program code means for mapping x→y, y ∈ R^(m), usingthe local neural network Net_(j) ^(L) associated with the referencepoint c_(j) identified in step (iii).